DC AirBnB Vs Zillow listings

This is a brief exploratory data analysis of AirBnB listings vs Zillow listings for Washington, DC, using web-scraped data from roughly June of 2024. There are no real interesting observations. I was trying out R (this is my first ever R Markdown) as part of the requirements for the final project of cs50’s Introduction to Programming with R.

I’m playing with R visualizations.

Data Preparation and Cleaning

Loading the Zillow data:

## 
## Attaching package: 'jsonlite'
## The following object is masked from 'package:purrr':
## 
##     flatten
## # A tibble: 6 × 13
##   zpid    homeStatus marketingStatus  price latitude longitude  beds baths  area
##   <chr>   <chr>      <chr>            <int>    <dbl>     <dbl> <int> <int> <int>
## 1 121035… ComingSoon Coming Soon     4.85e5     38.9     -77.1     2     2  1086
## 2 425748  ForSale    For Sale by Ag… 1.15e6     38.9     -77.0     3     4  2540
## 3 435451  ForSale    New Constructi… 2.90e7     38.9     -77.1     5     9 16250
## 4 458388  ComingSoon Coming Soon     1.20e6     39.0     -77.1     4     4  2089
## 5 344613… ForSale    For Sale by Ag… 4.26e5     38.9     -77.0     3     2  4800
## 6 351580… ForSale    For Sale by Ag… 3.25e5     38.9     -77.1     2     1   727
## # ℹ 4 more variables: zestimate <int>, rentZestimate <int>,
## #   taxAssessedValue <int>, url <chr>

Loading the AirBnB data, which drops a few errors ($7000 hostel beds that should be $70):

## # A tibble: 6 × 25
##   listing_id latitude longitude hover_description              listing_url price
##        <dbl>    <dbl>     <dbl> <chr>                          <chr>       <dbl>
## 1       3686     38.9     -77.0 Vita's Hideaway                https://ww…   67 
## 2       3943     38.9     -77.0 Historic Rowhouse Near Monume… https://ww…   82 
## 3       4197     38.9     -77.0 Capitol Hill Bedroom walk to … https://ww…  135 
## 4       4529     38.9     -76.9 Bertina's  House Part One      https://ww…   66 
## 5       5589     38.9     -77.0 Cozy apt in Adams Morgan       https://ww…  130.
## 6     178395     39.0     -77.0 Spare Room for Washington,DC … https://ww…  399 
## # ℹ 19 more variables: property_type <chr>, room_type <chr>,
## #   accommodates <dbl>, number_of_reviews <dbl>, number_of_reviews_ltm <dbl>,
## #   number_of_reviews_l30d <dbl>, first_review <date>, last_review <date>,
## #   review_scores_rating <dbl>, reviews_per_month <dbl>, host_id <dbl>,
## #   host_name <chr>, host_identity_verified <lgl>, host_listings_count <dbl>,
## #   host_total_listings_count <dbl>, license <chr>, neighborhood <chr>,
## #   minimum_nights <dbl>, shortNeighborhood <chr>

Assign neighborhoods to the Zillow data. This drops 6 rows lacking lat/long coordinates.

## # A tibble: 6 × 15
##   zpid      homeStatus marketingStatus      price  beds baths  area zestimate
##   <chr>     <chr>      <chr>                <int> <int> <int> <int>     <int>
## 1 12103562  ComingSoon Coming Soon         485000     2     2  1086    501400
## 2 425748    ForSale    For Sale by Agent  1150000     3     4  2540   1139800
## 3 435451    ForSale    New Construction  28995000     5     9 16250        NA
## 4 458388    ComingSoon Coming Soon        1195000     4     4  2089   1199000
## 5 344613704 ForSale    For Sale by Agent   425900     3     2  4800    403100
## 6 351580057 ForSale    For Sale by Agent   325000     2     1   727        NA
## # ℹ 7 more variables: rentZestimate <int>, taxAssessedValue <int>, url <chr>,
## #   neighborhood <chr>, shortNeighborhood <chr>, latitude <dbl>,
## #   longitude <dbl>

Verify that neighborhoods have been assigned correctly. The black plots are in DC, Red in Virginia and Maryland.

The red plots are “Unknown” in the upcoming Neighborhood counts:

## # A tibble: 40 × 2
##    shortNeighborhood               n
##    <chr>                       <int>
##  1 "Unknown"                    2624
##  2 "NW-mid Brightwood Park, C"   210
##  3 "NE Ivy City, Arboretum, T"   179
##  4 "NW-mid Columbia Heights, "   179
##  5 "NE/NW Edgewood, Bloomingd"   165
##  6 "NE Union Station, Stanton"   140
##  7 "SW Southwest Employment A"   113
##  8 "NW-mid Downtown, Chinatow"    91
##  9 "NW-mid Dupont Circle, Con"    91
## 10 "SE Capitol Hill, Lincoln "    89
## # ℹ 30 more rows

Map Zillow and AirBnB properties by neighborhood

Drop “Unknown” neighborhoods and see if neighborhoods coded correctly:

And the AirBnB’s:

Average Price

## # A tibble: 40 × 5
##    shortNeighborhood           avg_price_zillow avg_price_airbnb change_zillow
##    <chr>                       <chr>            <chr>            <chr>        
##  1 "NE Brookland, Brentwood, " $875,089.21      $135.76          -12.96%      
##  2 "NE Deanwood, Burrville, G" $450,288.53      $125.89          -55.21%      
##  3 "NE Eastland Gardens, Keni" $424,333.33      $91.00           -57.80%      
##  4 "NE Ivy City, Arboretum, T" $656,705.80      $148.19          -34.68%      
##  5 "NE Mayfair, Hillbrook, Ma" $451,578.81      $135.47          -55.09%      
##  6 "NE North Michigan Park, M" $762,030.60      $108.62          -24.21%      
##  7 "NE Union Station, Stanton" $999,628.07      $173.22          -0.58%       
##  8 "NE Woodridge, Fort Lincol" $700,391.23      $178.15          -30.34%      
##  9 "NE/NW Edgewood, Bloomingd" $826,102.61      $129.07          -17.83%      
## 10 "NE/NW Lamont Riggs, Queen" $644,808.97      $101.35          -35.87%      
## # ℹ 30 more rows
## # ℹ 1 more variable: change_airbnb <chr>

Create a scatterplot comparing percent change in Zillow versus percent change in Airbnb:

## `geom_smooth()` using formula = 'y ~ x'

It doesn’t look like much of a correlation. Let’s prove it. Test for fit:

## 
## Call:
## lm(formula = avg_price_airbnb ~ avg_price_zillow, data = combined_avg_price)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -68.612 -31.890  -4.078  19.389 156.171 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      1.540e+02  1.230e+01  12.518 4.69e-15 ***
## avg_price_zillow 1.323e-05  9.369e-06   1.412    0.166    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 49.27 on 38 degrees of freedom
## Multiple R-squared:  0.04985,    Adjusted R-squared:  0.02485 
## F-statistic: 1.994 on 1 and 38 DF,  p-value: 0.1661

Conclusion:

AirBnB prices have little to do with Zillow prices (on the neighborhood level) - perhaps because the neighborhoods are not subdivided enough, perhaps because vacationers are looking for different things than home buyers?

Price Histograms

Median Price

## # A tibble: 39 × 2
##    shortNeighborhood           median_airbnb_price
##    <chr>                                     <dbl>
##  1 "NE Brookland, Brentwood, "               116. 
##  2 "NE Deanwood, Burrville, G"                82.5
##  3 "NE Eastland Gardens, Keni"                68  
##  4 "NE Ivy City, Arboretum, T"               129  
##  5 "NE Mayfair, Hillbrook, Ma"                81  
##  6 "NE North Michigan Park, M"                98  
##  7 "NE Union Station, Stanton"               150  
##  8 "NE Woodridge, Fort Lincol"               131  
##  9 "NE/NW Edgewood, Bloomingd"               120  
## 10 "NE/NW Lamont Riggs, Queen"                92  
## # ℹ 29 more rows
## # A tibble: 39 × 2
##    shortNeighborhood           median_zillow_price
##    <chr>                                     <dbl>
##  1 "NE Brookland, Brentwood, "             764450 
##  2 "NE Deanwood, Burrville, G"             420000 
##  3 "NE Eastland Gardens, Keni"             475000 
##  4 "NE Ivy City, Arboretum, T"             599000 
##  5 "NE Mayfair, Hillbrook, Ma"             450000 
##  6 "NE North Michigan Park, M"             649950.
##  7 "NE Union Station, Stanton"             850000 
##  8 "NE Woodridge, Fort Lincol"             687475 
##  9 "NE/NW Edgewood, Bloomingd"             739000 
## 10 "NE/NW Lamont Riggs, Queen"             634950 
## # ℹ 29 more rows

Heatmap of Price by Neighborhood:

Listings per Neighborhood

Hypothetical ROI

Let’s estimate the Return on Investment, if hypothetically buying a house to rent out as an AirBnB, using reviews per month. Note that this is extremely crude and assumption laden. Neighborhoods are large blocks and contain diversity within them, no granular detail included on type of rental (whole property? shared room?), and an off-the-cuff two nights per review (a possible way to refine this estimate would be based on future availability, itself an assumption-laden approach).

## # A tibble: 39 × 7
##    shortNeighborhood      roi   avg_airbnb_price avg_reviews median_zillow_price
##    <chr>                  <chr> <chr>                  <dbl> <chr>              
##  1 SE Fairfax Village, N… 4.09… $161.93                 1.87 $177,500.00        
##  2 NW-far Cathedral Heig… 3.69… $211.92                 2.67 $368,000.00        
##  3 SW Southwest Employme… 2.78… $323.01                 2.01 $559,900.00        
##  4 SE Near Southeast, Na… 2.37… $262.78                 2.28 $604,950.00        
##  5 NW-far North Clevelan… 2.27… $192.24                 1.91 $387,000.00        
##  6 SE Woodland/Fort Stan… 2.26… $164.40                 1.71 $299,000.00        
##  7 NW-mid Downtown, Chin… 2.25… $300.02                 1.53 $489,900.00        
##  8 SE Twining, Fairlawn,… 2.17… $154.95                 2.51 $430,000.00        
##  9 NW-mid West End, Fogg… 2.12… $189.69                 3.03 $649,000.00        
## 10 NW-mid Dupont Circle,… 1.81… $207.48                 2.30 $629,000.00        
## # ℹ 29 more rows
## # ℹ 2 more variables: median_z_scaled <dbl[,1]>, estimated_revenue <chr>

ROI per neighborhood distribution:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00182 0.00750 0.01226 0.01392 0.01791 0.04094

Some Maps

## Zoom: 12

## Zoom: 12